History-Based Controller Design and Optimization for Partially Observable MDPs

نویسندگان

  • Akshat Kumar
  • Shlomo Zilberstein
چکیده

Partially observable MDPs provide an elegant framework for sequential decision making. Finite-state controllers (FSCs) are often used to represent policies for infinite-horizon problems as they offer a compact representation, simple-toexecute plans, and adjustable tradeoff between computational complexity and policy size. We develop novel connections between optimizing FSCs for POMDPs and the dual linear program for MDPs. Building on that, we present a dual mixed integer linear program (MIP) for optimizing FSCs. To assign well-defined meaning to FSC nodes as well as aid in policy search, we show how to associate history-based features with each FSC node. Using this representation, we address another challenging problem, that of iteratively deciding which nodes to add to FSC to get a better policy. Using an efficient off-the-shelf MIP solver, we show that this new approach can find compact near-optimal FSCs for several large benchmark domains, and is competitive with previous best approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multiple-Environment Markov Decision Processes

We introduce Multi-Environment Markov Decision Processes (MEMDPs) which are MDPs with a set of probabilistic transition functions. The goal in a MEMDP is to synthesize a single controller with guaranteed performances against all environments even though the environment is unknown a priori. While MEMDPs can be seen as a special class of partially observable MDPs, we show that several verificatio...

متن کامل

Geometry and Determinism of Optimal Stationary Control in Partially Observable Markov Decision Processes

It is well known that any finite state Markov decision process (MDP) has a deterministic memoryless policy that maximizes the discounted longterm expected reward. Hence for such MDPs the optimal control problem can be solved over the set of memoryless deterministic policies. In the case of partially observable Markov decision processes (POMDPs), where there is uncertainty about the world state,...

متن کامل

Linear Dynamic Programs for Resource Management

Sustainable resource management in many domains presents large continuous stochastic optimization problems, which can often be modeled as Markov decision processes (MDPs). To solve such large MDPs, we identify and leverage linearity in state and action sets that is common in resource management. In particular, we introduce linear dynamic programs (LDPs) that generalize resource management probl...

متن کامل

Final Performance Report Grant FA

The researchers made significant progress in all of the proposed research areas. The first major task in the proposal involved simulation-based and sampling methods for global optimization. In support of this task, we have discovered two new innovative approaches to simulation-based global optimization; the first involves connections between stochastic approximation and our model reference appr...

متن کامل

Producing efficient error-bounded solutions for transition independent decentralized mdps

There has been substantial progress on algorithms for single-agent sequential decision making using partially observable Markov decision processes (POMDPs). A number of efficient algorithms for solving POMDPs share two desirable properties: error-bounds and fast convergence rates. Despite significant efforts, no algorithms for solving decentralized POMDPs benefit from these properties, leading ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015